Dimensionality Reduction

Dimensionality Reduction#

Normalization#

#@title Load quality controlled data 

adata_QC = sc.read_h5ad(filename= "/content/drive/MyDrive/scRNA_using_Python/Objects/sc_qc_filtered_covid.h5ad")

adata_QC

AnnData object with n_obs × n_vars = 3536 × 7650
    obs: 'type', 'sample', 'batch', 'n_genes_by_counts', 'total_counts', 'pct_counts_in_top_20_genes', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'pct_counts_hb', 'outlier', 'mt_outlier', 'n_counts', 'S_score', 'G2M_score', 'phase', '_scvi_batch', '_scvi_labels', 'prediction', 'leiden', 'doublet'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'ribo', 'hb', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'n_cells', 'mean', 'std'
    uns: '_scvi_manager_uuid', '_scvi_uuid', 'ambient_profile_Gene Expression', 'ambient_profile_all', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'prediction_colors', 'sample_colors', 'type_colors', 'umap'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'ambient_counts', 'counts'
    obsp: 'connectivities', 'distances'

#@title check count distribution :

p1 = sns.histplot(adata_QC.obs["total_counts"], bins=100, kde= True)

../../_images/b96e6ddbe8462f6869991331e7c259172a7c532d303bb6dd952efaddd16957e7.png

Note :

At this stage we have cleaned data according to filtered cells , corrected ambient RNA and doublet removed .

But data we had count Matrix (cells X Genes) , still variance due to sampling effect (cell capturing, library preparation, sequencing ) .

The difference we get between cells gene expression may be because of these sampling effects not by real difference . so we need to Normalize to correct for variance.

Normalize by 2 methods :

Log shifted Normalization

doesn’t prevent heterogenity
corrects for library depth

Pearson residual Normalization (prevents heterogenity)

prevent heterogenity
corrects for library depth

#@title Method-1 : Log shifted method :


scales_counts = sc.pp.normalize_total(adata_QC, target_sum=None, inplace=False)
# log1p transform
adata_QC.layers["log1p_norm"] = sc.pp.log1p(scales_counts["X"], copy=True)

adata_QC

AnnData object with n_obs × n_vars = 3536 × 7650
    obs: 'type', 'sample', 'batch', 'n_genes_by_counts', 'total_counts', 'pct_counts_in_top_20_genes', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'pct_counts_hb', 'outlier', 'mt_outlier', 'n_counts', 'S_score', 'G2M_score', 'phase', '_scvi_batch', '_scvi_labels', 'prediction', 'leiden', 'doublet'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'ribo', 'hb', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'n_cells', 'mean', 'std'
    uns: '_scvi_manager_uuid', '_scvi_uuid', 'ambient_profile_Gene Expression', 'ambient_profile_all', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'prediction_colors', 'sample_colors', 'type_colors', 'umap'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'ambient_counts', 'counts', 'log1p_norm'
    obsp: 'connectivities', 'distances'

fig, axes = plt.subplots(1, 2, figsize=(10, 5))
p1 = sns.histplot(adata_QC.obs["total_counts"], bins=100, kde=True, ax=axes[0])
axes[0].set_title("Total counts")
p2 = sns.histplot(adata_QC.layers["log1p_norm"].sum(1), bins=100, kde=True, ax=axes[1])
axes[1].set_title("Shifted logarithm")
plt.show()

../../_images/50d1833a8d572b79890317f23945ef8d545e18a3ae7c9da35a8d822858cc77cd.png

#@title Method-2 : Analytic Pearson residual method ( APR )

Pearson Residual :

In pearson residual method the preprocessed counts are compared to the expected counts of a “null model” ( This Null model includes technical variance cells but no biological variability between cells.)

Pearson residuals :

genes that are not differentially expressed = will have variance close to 1.
gene is differentially expressed, it will deviate from the null model, residual variance >1 for that gene

> Pearson Residual method helps in :

Remove the technical variation that comes from library depth
stabilize the mean-variance relationship across genes, i.e. ensure that biological signal from both low and high expression genes can contribute similarly to downstream processing
genes that are homogenously expressed (like housekeeping genes) have small variance, while genes that are differentially expressed (like marker genes) have high variance

analytic_pearson = sc.experimental.pp.normalize_pearson_residuals(adata_QC, inplace=False)
adata_QC.layers["APR_counts"] = csr_matrix(analytic_pearson["X"])

adata_QC

AnnData object with n_obs × n_vars = 3536 × 7650
    obs: 'type', 'sample', 'batch', 'n_genes_by_counts', 'total_counts', 'pct_counts_in_top_20_genes', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'pct_counts_hb', 'outlier', 'mt_outlier', 'n_counts', 'S_score', 'G2M_score', 'phase', '_scvi_batch', '_scvi_labels', 'prediction', 'leiden', 'doublet'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'ribo', 'hb', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'n_cells', 'mean', 'std'
    uns: '_scvi_manager_uuid', '_scvi_uuid', 'ambient_profile_Gene Expression', 'ambient_profile_all', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'prediction_colors', 'sample_colors', 'type_colors', 'umap'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'ambient_counts', 'counts', 'log1p_norm', 'APR_counts'
    obsp: 'connectivities', 'distances'

fig, axes = plt.subplots(1, 2, figsize=(10, 5))
p1 = sns.histplot(adata_QC.obs["total_counts"], bins=100, kde=True, ax=axes[0])
axes[0].set_title("Total counts")
p2 = sns.histplot(
    adata_QC.layers["APR_counts"].sum(1), bins=100, kde=True, ax=axes[1]
)
axes[1].set_title("APR_counts")
plt.show()

../../_images/b99d1e6176ec38d347ce62c057a765572707267c7bcae465a0ebb38d94828276.png

As you can see for this data Pearson Residual method performed well .

adata_QC

AnnData object with n_obs × n_vars = 3536 × 7650
    obs: 'type', 'sample', 'batch', 'n_genes_by_counts', 'total_counts', 'pct_counts_in_top_20_genes', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'pct_counts_hb', 'outlier', 'mt_outlier', 'n_counts', 'S_score', 'G2M_score', 'phase', '_scvi_batch', '_scvi_labels', 'prediction', 'leiden', 'doublet'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'ribo', 'hb', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'n_cells', 'mean', 'std'
    uns: '_scvi_manager_uuid', '_scvi_uuid', 'ambient_profile_Gene Expression', 'ambient_profile_all', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'prediction_colors', 'sample_colors', 'type_colors', 'umap'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'ambient_counts', 'counts', 'log1p_norm', 'APR_counts'
    obsp: 'connectivities', 'distances'

Feature Selection#

#@title select features

%%R
library(scry)

# Don/t run this in %%R because it is python code 

ro.globalenv["adata_QC"] = adata_QC

%%R
sce = devianceFeatureSelection(adata_QC, assay="X")
sce

class: SingleCellExperiment 
dim: 7650 3536 
metadata(13): _scvi_manager_uuid _scvi_uuid ... type_colors umap
assays(5): X ambient_counts counts log1p_norm APR_counts
rownames(7650): AL669831.5 FAM41C ... AL592183.1 AC004556.1
rowData names(14): gene_ids feature_types ... std binomial_deviance
colnames(3536): AGGGTCCCATGACCCG-1-0 ATTCCTAGTGACTGTT-1-0 ...
  GAGGCCTTCTCCTGCA-14-5 CCCTAACAGTTTCTTC-14-5
colData names(23): type sample ... leiden doublet
reducedDimNames(2): PCA UMAP
mainExpName: NULL
altExpNames(0):

binomial_deviance = ro.r("rowData(sce)$binomial_deviance").T

idx = binomial_deviance.argsort()[-4000:]
mask = np.zeros(adata_QC.var_names.shape, dtype=bool)
mask[idx] = True

adata_QC.var["highly_deviant"] = mask
adata_QC.var["binomial_deviance"] = binomial_deviance

adata_QC

AnnData object with n_obs × n_vars = 3536 × 7650
    obs: 'type', 'sample', 'batch', 'n_genes_by_counts', 'total_counts', 'pct_counts_in_top_20_genes', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'pct_counts_hb', 'outlier', 'mt_outlier', 'n_counts', 'S_score', 'G2M_score', 'phase', '_scvi_batch', '_scvi_labels', 'prediction', 'leiden', 'doublet'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'ribo', 'hb', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'n_cells', 'mean', 'std', 'highly_deviant', 'binomial_deviance'
    uns: '_scvi_manager_uuid', '_scvi_uuid', 'ambient_profile_Gene Expression', 'ambient_profile_all', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'prediction_colors', 'sample_colors', 'type_colors', 'umap'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'ambient_counts', 'counts', 'log1p_norm', 'APR_counts'
    obsp: 'connectivities', 'distances'

Save QC fitered Normalized Feature selected object#

save_file = 'Objects/sc_QCNFS_covid.h5ad'
adata_QC.write_h5ad(save_file)

PCA,tSNE,UMAP#

#@title Load QC Filtered Normalized Feature selected object

adata_QCNFS = sc.read_h5ad(filename= "/content/drive/MyDrive/scRNA_using_Python/Objects/sc_QCNFS_covid.h5ad")

adata_QCNFS

AnnData object with n_obs × n_vars = 3536 × 7650
    obs: 'type', 'sample', 'batch', 'n_genes_by_counts', 'total_counts', 'pct_counts_in_top_20_genes', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'pct_counts_hb', 'outlier', 'mt_outlier', 'n_counts', 'S_score', 'G2M_score', 'phase', '_scvi_batch', '_scvi_labels', 'prediction', 'leiden', 'doublet'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'ribo', 'hb', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'n_cells', 'mean', 'std', 'highly_deviant', 'binomial_deviance'
    uns: '_scvi_manager_uuid', '_scvi_uuid', 'ambient_profile_Gene Expression', 'ambient_profile_all', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'prediction_colors', 'sample_colors', 'type_colors', 'umap'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'APR_counts', 'ambient_counts', 'counts', 'log1p_norm'
    obsp: 'connectivities', 'distances'

PCA

PCA offers the advantage that it is highly interpretable and computationally efficient. However, as scRNA-seq datasets are rather sparse due to dropout events and therefore highly non-linear, visualization with the linear dimensionality reduction technique PCA is not very appropriate. PCA is typically used to select the top 10-50 PCs which are used for downstream analysis tasks.

t-SNE UMAP

t-distributed stochastic neighbor embedding (t-SNE) as it yielded the best overall performance. Uniform manifold approximation and projection (UMAP) showed the highest stability and separates best the original cell populations.

adata_for_PCA = adata_QCNFS.copy()

adata_for_PCA

AnnData object with n_obs × n_vars = 3536 × 7650
    obs: 'type', 'sample', 'batch', 'n_genes_by_counts', 'total_counts', 'pct_counts_in_top_20_genes', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'pct_counts_hb', 'outlier', 'mt_outlier', 'n_counts', 'S_score', 'G2M_score', 'phase', '_scvi_batch', '_scvi_labels', 'prediction', 'leiden', 'doublet'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'ribo', 'hb', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'n_cells', 'mean', 'std', 'highly_deviant', 'binomial_deviance'
    uns: '_scvi_manager_uuid', '_scvi_uuid', 'ambient_profile_Gene Expression', 'ambient_profile_all', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'prediction_colors', 'sample_colors', 'type_colors', 'umap'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'APR_counts', 'ambient_counts', 'counts', 'log1p_norm'
    obsp: 'connectivities', 'distances'

# Perform Z scale Normalization to use for PCA
# use the count matrix layer zscale_norm_counts

# USe highly deviance genes as highly variable for PCA

# regress out unwanted variables
sc.pp.regress_out(adata_for_PCA, ['total_counts', 'pct_counts_mt'])

# scale data, clip values exceeding standard deviation 10.
sc.pp.scale(adata_for_PCA, max_value=10)

adata_for_PCA

AnnData object with n_obs × n_vars = 3536 × 7650
    obs: 'type', 'sample', 'batch', 'n_genes_by_counts', 'total_counts', 'pct_counts_in_top_20_genes', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'pct_counts_hb', 'outlier', 'mt_outlier', 'n_counts', 'S_score', 'G2M_score', 'phase', '_scvi_batch', '_scvi_labels', 'prediction', 'leiden', 'doublet'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'ribo', 'hb', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'n_cells', 'mean', 'std', 'highly_deviant', 'binomial_deviance'
    uns: '_scvi_manager_uuid', '_scvi_uuid', 'ambient_profile_Gene Expression', 'ambient_profile_all', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'prediction_colors', 'sample_colors', 'type_colors', 'umap'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'APR_counts', 'ambient_counts', 'counts', 'log1p_norm'
    obsp: 'connectivities', 'distances'

#@title we use log nomalize data

adata_for_PCA.X = adata_for_PCA.layers["log1p_norm"]

adata_for_PCA

AnnData object with n_obs × n_vars = 3536 × 7650
    obs: 'type', 'sample', 'batch', 'n_genes_by_counts', 'total_counts', 'pct_counts_in_top_20_genes', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'pct_counts_hb', 'outlier', 'mt_outlier', 'n_counts', 'S_score', 'G2M_score', 'phase', '_scvi_batch', '_scvi_labels', 'prediction', 'leiden', 'doublet'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'ribo', 'hb', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'n_cells', 'mean', 'std', 'highly_deviant', 'binomial_deviance'
    uns: '_scvi_manager_uuid', '_scvi_uuid', 'ambient_profile_Gene Expression', 'ambient_profile_all', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'prediction_colors', 'sample_colors', 'type_colors', 'umap'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'APR_counts', 'ambient_counts', 'counts', 'log1p_norm'
    obsp: 'connectivities', 'distances'

#@title set highl variable as highly deviance 

# setting highly variable as highly deviant to use scanpy 'use_highly_variable' argument in sc.pp.pca
adata_for_PCA.var["highly_variable"] = adata_for_PCA.var["highly_deviant"]

#@title Perform PCA 

sc.pp.pca(adata_for_PCA, svd_solver="arpack", use_highly_variable=True)

adata_for_PCA

AnnData object with n_obs × n_vars = 3536 × 7650
    obs: 'type', 'sample', 'batch', 'n_genes_by_counts', 'total_counts', 'pct_counts_in_top_20_genes', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'pct_counts_hb', 'outlier', 'mt_outlier', 'n_counts', 'S_score', 'G2M_score', 'phase', '_scvi_batch', '_scvi_labels', 'prediction', 'leiden', 'doublet'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'ribo', 'hb', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'n_cells', 'mean', 'std', 'highly_deviant', 'binomial_deviance', 'highly_variable'
    uns: '_scvi_manager_uuid', '_scvi_uuid', 'ambient_profile_Gene Expression', 'ambient_profile_all', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'prediction_colors', 'sample_colors', 'type_colors', 'umap'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'APR_counts', 'ambient_counts', 'counts', 'log1p_norm'
    obsp: 'connectivities', 'distances'

#@title Plot PCA

sc.pl.pca_scatter(adata_for_PCA, color="total_counts")

../../_images/e0e9d3fa3f14d278ce1fe8813cbec38430f360753503d3e8dcf5097551a486c5.png

sc.pl.pca(adata_for_PCA, color='sample', components = ['1,2','3,4','5,6','7,8'], ncols=4)

../../_images/81fea16e68e6b0f48e56dbc285f017a02f450ed21d53702277b88833f57d9b5f.png

To identify genes that contribute most to each PC, one can retreive the loading matrix information.

#Plot loadings
sc.pl.pca_loadings(adata_for_PCA, components=[1,2,3,4,5,6,7,8])

# OBS! only plots the positive axes genes from each PC!!

../../_images/7d87758fbb4416791f79d08ea90a4e89cec33638ede936725623196acb4d0c3f.png

The function to plot loading genes only plots genes on the positive axes. Instead plot as a heatmaps, with genes on both postive and negative side, one per pc, and plot their expression amongst cells ordered by their position along the pc.

adata_for_PCA.obs.head(4)

	type	sample	n_genes_by_counts	total_counts	pct_counts_in_top_20_genes	total_counts_mt	pct_counts_mt	total_counts_ribo	pct_counts_ribo	...	mt_outlier	n_counts	S_score	G2M_score	phase	prediction	leiden	doublet
AGGGTCCCATGACCCG-1-0	Covid	covid_1	2140	7698.0	24.123149	525.0	6.819953	2564.0	33.307353	...	False	5929.0	0.115495	-0.181939	S	singlet	2	singlet
ATTCCTAGTGACTGTT-1-0	Covid	covid_1	1808	7535.0	27.458527	470.0	6.237558	3397.0	45.082946	...	False	5855.0	0.299016	0.346879	G2M	singlet	5	singlet
CCTAAGACAGATTAAG-1-0	Covid	covid_1	2449	9025.0	19.789474	513.0	5.684211	2637.0	29.218837	...	False	7611.0	0.407661	0.150615	S	singlet	5	singlet
AATAGAGAGGGTTAGC-1-0	Covid	covid_1	392	937.0	38.313767	63.0	6.723586	63.0	6.723586	...	False	773.0	0.071607	-0.092472	S	singlet	8	singlet

4 rows × 23 columns

# adata.obsm["X_pca"] is the embeddings
# adata.uns["pca"] is pc variance
# adata.varm['PCs'] is the loadings

genes = adata_for_PCA.var['gene_ids']

for pc in [1,2,3,4]:
    g = adata_for_PCA.varm['PCs'][:,pc-1]
    o = np.argsort(g)
    sel = np.concatenate((o[:10],o[-10:])).tolist()
    emb = adata_for_PCA.obsm['X_pca'][:,pc-1]
    # order by position on that pc
    tempdata = adata_for_PCA[np.argsort(emb),]
    sc.pl.heatmap(tempdata, var_names = genes[sel].index.tolist(), groupby='sample', swap_axes = True, use_raw=False)

../../_images/c7190ea19f09d8f8252d124ad7c3f70e323b544254fc11d43891a04d24cbe4f8.png

../../_images/5305a7b4873082830d641fb406882f604decd425bad2466f646b80014d2f49ae.png

../../_images/ad8f3610eb06eb67970628cb9a8b5db32c4ca3105160189debe935ecfe1eb973.png

../../_images/a223d48cd01a2d09fb402811c9057e1e764d4e413105fb0618bfbbdefec08cb9.png

sc.pl.pca_variance_ratio(adata_for_PCA, log=True, n_pcs = 50)

../../_images/3cfd50ade7c1c6d06f12f281f8a3e7a1c539bbabd9491cbc03e335985ac6c99d.png

#@title t-SNE 

sc.tl.tsne(adata_for_PCA, use_rep="X_pca")

sc.pl.tsne(adata_for_PCA, color="sample")

../../_images/27ef69307435ab3a5944c2034853b7319ab1a3de897833eae10c79fc923b1cf0.png

#@title U-MAP

sc.pp.neighbors(adata_for_PCA)
sc.tl.umap(adata_for_PCA)

sc.pl.umap(adata_for_PCA, color="sample")

../../_images/0852a0f3b6ebb97170f81fec5ace3a72ea4a0e336c7f138f0845dc0b9e7b372b.png

#run with 10 components, save to a new object so that the umap with 2D is not overwritten.
umap10 = sc.tl.umap(adata_for_PCA, n_components=10, copy=True)

fig, axs = plt.subplots(1, 3, figsize=(10,4),constrained_layout=True)

sc.pl.umap(adata_for_PCA, color='sample',  title="UMAP", show=False, ax=axs[0])
sc.pl.umap(umap10, color='sample', title="UMAP10", show=False, ax=axs[1], components=['1,2'])
sc.pl.umap(umap10, color='sample', title="UMAP10", show=False, ax=axs[2], components=['3,4'])

<Axes: title={'center': 'UMAP10'}, xlabel='UMAP3', ylabel='UMAP4'>

../../_images/a9b0a502b8455b542c53c43965d5482a4de2125d55f627a4f49052c3019f952a.png

sc.pl.umap(adata_for_PCA, color=["CD3E", "CD4", "CD8A", "GNLY","NKG7", "MS4A1","CD14","LYZ","CST3","MS4A7","FCGR3A"])

../../_images/2c57f2aec3962843b36eb6349e1bca874cf8022e3e1c2458f5246664783f53fe.png

sc.pl.umap(
    adata_for_PCA,
    color=["total_counts", "pct_counts_mt", 'pct_counts_ribo', 'pct_counts_hb', "doublet"],
)

../../_images/1f782627f38a244d90fc95e38855471ec2753e4880e0a046ed88e2f420605d25.png

Save QC filtered feature selected , DM reduced data object#

save_file = 'Objects/sc_QCNFSDM_covid.h5ad'
adata_for_PCA.write_h5ad(save_file)